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Hyperspectral image classification (HSIC) on remote sensing imaging has 
brought immersive achievement using artificial intelligence technology. In 
deep learning convolution neural networks (CNN), 2D-CNN, and 3D-CNN 
methods are widely used to classify the spectral-spatial bands of hyperspectral 
images (HSI). The proposed Hybrid 3D-CNN (H3D-CNN) model framework 


for deeper features extraction predicts classification accuracy in supervised 
learning. The model reduces the narrow gap between supervised and 
unsupervised learning and the complexity and cost of the previous models. 
The HSI classification analysis is carried out on real-world data sets of Indian 
pines Salinas datasets captured by Airborne visible, infrared imaging 
spectrometer (AVIRIS) sensors that performed superior classification 
accuracy results. 
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1. INTRODUCTION 

Hyperspectral imaging (HSI) research has exploded in recent years due to its wide range of 
applications. Remote sensing takes digital images with hundreds and thousands of tiny spectral bands with 
spectral fingerprints ranging from visible to near-visible wavelengths [1]. Remote sensors produce 
spectrometers images that are rich in spectral and spatial information, and also images are in the form of data 
cubes with multi-resolution spectral-spatial information of hyperspectral cubes Figure 1 [2], the HSI has been 
widely applied in agriculture, environmental studies, biological, fraud detection, astronomy, and mineral 
exploration [3] over the recent decades. Instead of characteristics directly connected to the pixels, each pixel 
in HSI relies on features from a tiny area surrounding the pixels. In the context of supervised training and 
classification [4], a variety of methods have been used for HSI data classification multinomial logistic 
regression [5], support vector machine (SVM), distance measures, K-nearest neighbor, and maximum 
likelihood [6]. 

Methods of spectral-spatial categorization can be into two categories spectral and spatial contextual 
information. Advanced spatial extraction is achieved using morphological profiles [7], entropy [8], attribute 
profiles [9], and low-rank representation [10]. Then, using dimensionality reduction (DR), these altered spatial 
data are coupled with spectral features to conduct pixes-wise classification. Furthermore, the Hopfield neural 
network in [11] has collected hyperspectral data in remote sensing images. The presentation of HSI data in 3D 
cubes leads to many feature cubes carrying crucial information on signal space [12], spectrum, and combined 
spatial/spectrum correlation, all of which are necessary for improved perforation. 

According to the existing literature, the convolution neural network has gained popularity due to 2D- 
CNN and 3D-CNN in HSI [13]. Previous models are used for spectral learning, while subsequent models learn 
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local spatial features at each strip [14]. These models exhibit a weakness in feature extraction when applied to 
multi-dimensional data cubes. The 3D-CNN approach was overly complex due to the calculation and 
classification accuracy. Non-linear problems can use kernel-based methods as well. By mapping the original 
data onto a higher-dimensional Hilbert space, kernel techniques convert non-linear problems to linear 
problems. 

In this work, I propose a novel deep learning method for providing a relatively general and 
comprehensive overview of the existing methods. The motivation of our work is the classification of a 
hyperspectral image called a hybrid 3D-CNN with enhanced features, which considers both spectral and spatial 
information. The silent feature of the proposed H3D-CNN model is efficient in hyperspectral image computing 
and accurate in classification.2D-CNN and 3D-CNN alone are not able to extract accuracy features from the 
HSI [15], volumetric data with multi-dimensional features [16], [17]. So, this motivates me to propose the 
Hybrid 3D-CNN. The resulting deep classifier model is trained in an end-to-end. At the same time, the Hybrid 
3D-CNN parameters are supervised learning based with a limited number of training samples to increase the 
accuracy with large testing samples. We compared our model with different real-world HSI datasets. 


Hyperspectral cube single pixel 


Figure 1. Hyperspectral data cube and spectral-spatial feature 


2. PROPOSED METHOD 
2.1. Novelty of the method 

Convolution neural network is popular for machine learning methods in supervised learning used for 
the classification of hyperspectral images for feature extraction inspired the hybrid model for classification 
accuracy. We go over the publically available datasets compared with different methods of the Hybrid 3D 
convolution neural network (H3D-CNN), based classification technique in-depth in this part and how to 
educate and evaluate this system on hyperspectral images. The model inspired the state-of-the-art machine 
learning models for classification of the real-world datasets. Deep learning (DL) techniques are quite capable 
to represent the extraction of feature information automatically. 


2.2. Architecture model evolution 

The most accessible approach to retrieve data from an input image is to use a convolution neural 
network (CNN). When using a 2D-CNN on a hyperspectral image containing hundreds of spectral dimensions, 
the convolution of each input using kernels might increase the computation cost. As a result, dimensional 
suppression is used to lower dimensionality before using 2D-CNN to extract features and classification [18]. 

The principal component analysis (PCA) extracts features from the hyperspectral image to reduce the 
dimension before 2D-CNN for in-depth features of each pixel with labels [19]. The high-level features are first 
extracted with the PCA algorithm retaining the spatial information for further classification and loss of spectral 
information in the 2D-CNN [20]. The PCA reduces the spectral band of the data cube with C € QW*#*? where 
‘C’ abide the load, ‘W’s the measurement (width),’ H’ is the height and ‘D” is the dept out of the spectrum/band, 
after the reduction of the input data it is X € Q¥*#*? where ‘X’ is the new modified data input to the convolution 
neural network with reduced dimensions without losing the spatial information [21]. The data cube is defined 
as little overlapping patches of the scene, and the truth marks determine the electromagnetic frequency reflected 
in class labels. 

In 2D-CNN, the activation function of the values (x, y) at the j” spatial location at facet maps 


sth 
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Where ‘b;;’ bias at ‘j feature map of the ‘i”" ‘layer, ‘d’ is the activation function,d.; endures the 
feature map for the (/-/)" layer, and the intensity of the kernel is w;; along with the j" feature maps for the i” 
layer ’+/ is the width of the kernel 5+/ the height of the kernel and w;,; is the weight parameters of i’” layer and 
j feature map. 

As a result, 3D kernels are employed in 3D convolution procedures to concurrently extract spectral 
and spatial information for hyperspectral picture categorization [23]. The required information is convolved 
using learnable 3D kernels 3x3 for each layer in hybrid 3D-CNN [24]. The proposed model with two 
convolution layers yields the best result. The output of the linear classified layers activated at the activation 
function is fed to SoftMax [25] to generate the classification image maps. 

The 3D-CNN patches split the segments with the filters one in layer-I with a 2x2x9 complex neural 
network with the stride of ’2’ and reduce the dimensionality, In layer-II, III in Figure 2. the filters are 3,5 with 
the stride of ‘1’’2’, respectively. In spatial feature mapping, it is convoluted in the fully connected layer, and 
SoftMax generates the feature maps with reduced predicted output. 


XYZ di-1yn Y 5 OpA .. xto,y+p,zta 
Vij = OCbij + Lead Laan Up=-y Lo=-5 Oj j2 * Via ) (2) 


3D-CNN activation function and the spatial value position at x, y, z in the j” feature map of the i” 
layer noted as v;,;*** in (2), the parameters of CNN: such as bias ‘b’ weights ‘w’ are trained using the supervised 
approach [26]. The 3D Convolution Neural Network extracts the spatial and spectral, but it increases the 
computation and complexity [27], also acquiring the advantage of the involuntary element knowledge of 2D 
and 3D convolutional neural networks. We proposed the fusion spectral convolutions. The 3D-CNN kernel 
derives spatial and spectral information continuously from HIS datasets in Figure 2. 

HSI is volumetric data that has spectral and spatial data with deep, complex dimensions in 3D- 
CNN [28]. So, we propose the Hybrid 3D-CNN model for deep dimensional spectral analysis and spatial 
information. The data cube is segmented into the spectral-spatial, applying the PCA to reduce the 
dimensionality redundancy. 


— 
4 %t, 
PCA ; > te, 
— | jae 
Hyperspectral Image 2x2@9 stride’2° 3x3@5 stride'l’  3x3@3 stride*2” Feature4 Feature5 SoftMax 
Feature1 Feature2 Feature3 


Figure 2. Hybrid 3D-CNN model 


2.3. Training model 

In convolution neural network HSI samples for ‘N’ linear different training sample [(X;, Y;)]i-1N, where 
Xi=[Xi1......Xig]"€ IR¢ is the spectral feature of the training samples and Yj=[yil...... yia]"€ R¢ label information 
corresponding to samples Xi, hidden nodes represented as ‘L.’ 


f(x) = YLBg(wi. xj + bi) = Yj D 
Where j=1, wi=[Wil.....wWia]’ weight vector linking the i' hidden layer ‘b;’bias of the i hidden layer,B= [Bi1.... 


Bim]? hidden output layer of the neural. There exist realization of the activation function such as sigmoid and 
radial basis function. 


g(w, b,x) = 1/(1 + e+) 
Where equation I is written as H B = Y, Where B = [B1....BL]’e RO™, Y = [Y1.....Yn]7’s RN™ 
The output of the hidden layer is the combinational coefficient, which balances the spatial and spectral 


information. The parameters ‘uy’ are crucial in checking the reliability of hyperspectral image categorization of 
the picture. 


A= pHs+(1-pwHdw 
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Where H;&Hy are the output of the hidden layer corresponding to the spatial feature and spectral feature for 
computation, due to the output of the multi-hidden layer in the convolution network of the output is fed with 
the SoftMax to predict the feature of the classification accuracy. 


3. RESULTS AND DISCUSSION 
3.1. Data depiction 

The sensors collect reflective electromagnetic spectra with narrow spectral bands, this reflective 
portion creates a unique spectral signature for the classification of objects Our 3D-CNN model can adopt the 
3D structure against the start-of-the-art deep learning methods for HIS classification. The publicly available 
three datasets were used for result analysis and classification. 


3.1.1. AVIRIS Indian pines dataset 

The dataset of Indian pines was collected aside from airborne in Northwest parts of Indian, USA. It 
has 16 labeled classes, 10249 samples, and 220 spectral bands. The data set ranges from 0.2 to 2.4 um 
wavelength, with a narrow bandwidth. Each scene has 145x145 components with a 20 m spatial declaration. 
Among 220 bands, 20 noise bands were pre-processed during training [29]. The task of creating ground truth 
and pixel labeling is time-demanding, and few samples are used for research. 


3.1.2. Pavia University (PU) dataset: 

Pavia university’s images were captured over north Italy. There are 9 design argument precision in 
the midst of 610 x 610 components, each with a 1.3 m range dimensional declaration of each pixel and removed 
water-absorbed bands, and the remaining 103 bands are used for training and testing [30]. After removing the 
noisy and other bands 9 classes have 42776 samples in the dataset used for training. The dataset mainly reflects 
the urban landscape information. 


3.1.3. Salinas dataset: 

The AVIRIS sensor was captured in Salinas Valley, California, USA, with a spectral spectrum of 
512x217 components, along 224 phantom bands. The statistics from Salinas have a pixel declaration of 3.7 
meters and 16 classes. 


3.1.4. Kennedy space centre (KSC) dataset 

AVIRIS sensors have collected data over the KSC Florida, with 224 strings of 10 nm measurement 
bands with a core vision of 400-2500 nm, an elevation of around 20 km, and a dimensional decision of 
approximately 18 m. The records analysis involved 176 posse, elimination of low SNR bands, and water 
absorption bands. 


3.2. Model testing results analysis 

Outcomes of the Indian pines scene: The H3D-CNN model in Figure 1 is used to classify each pixel 
of the 7x7x200 patch. The information included 200 spectral bands handled as channels after the 2D and 3D 
convolution layers of size 3x3. The step is set to 2,2 in the first function, resulting in hidden layers created from 
the first to the last stride in the convolution network layers are 5x5,3x3, and 1x1, respectively. Finally, the 
classified 16 labels are deployed in the image map using SoftMax. The processed model includes the Adam 
optimizer cross-entropy for the loss function and Relu activation function for the different kernel sizes of the 
best 3x3. The four layers in the convolution network with different filters of sizes 8,16, and 32, respectively, 
were used. 

The model of Hybrid 3D-CNN is depicted in Figure 2, Table 1 report the different methods used in 
the different datasets at the different convolution layers Table 1 resemble the results of different methods, the 
principal component analysis (PCA) lowers the dimensionality of the database without losing spatial data, 
while the SVM classifier in minimizes the dimensionality of the database without losing spectral information 
and ignores the spatial and spectral context. The second is that 2D-CNN achieves a comparable performance 
then SVM and the next is the 3D-CNN method performed well over the other methods in and finally the 
H3DCNN method achieved the best average accuracy. In Table 2 The kappa assesses the narrowly categorized 
cases for the model learning, matched with the actual truth and classified maps, regulating the accuracy with 
different methods and the average accuracy (AA) and overall accuracy (OA) for different methods. Figure 3 
and Figure 4 show the classified outcomes for the IP dataset and the SA dataset, respectively, and the accuracy 
of each class label is presented in the Table 1. 
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Table 1. Classification accuracy of Indian Pines, Salinas dataset comparisons 


S.NO Class No. Indian_ Pines Accuracy Class No. Salinas DS Accuracy 
Labels Samples Labels Samples 
H3D- 3D- 2D- SVM H3D- 3D- 2D- SVM 
CNN CNN- CNN CNN CNN CNN 
1 Alfalfa 46 88.90 85.71 71.72 85.71  Brocoli- 804 100 97.50 99.50 99.83 
green- 
weeds-1 
2 Corn-notill 1428 92.11 96.46 95.85 86.82  Brocoli- 1490 100 99.46 99.82 100 
green- 
weeds-2 
3 Corn-mintill 830 96.08 97.13 95.90 86.12 Fallow 790 100 97.80 98.31 100 
4 Corn 237 97.90 98.55 73.91 88.40  Fallow- 558 99.73 97.13 97.61 99.04 
rough- 
plow 
5 Grass- 483 87.33, 97.90 97.20 95.10  Fallow- 1071 99.85 98.80 98.50 99.75 
pasture smoth 
6 Grass-trees 730 99.82 97.68 96.31 98.61 Stubble 1584 100 98.91 99.75 100 
7 Grass- 28 81.81 100 100 75.00 Celery 1432 99.75 96.55 98.42 99.91 
pasture- 
mowed 
8 Hay- 478 100 99.30 100 98.60 Grapes- 4509 95.80 97.28 99.47 92.69 
windrowed untrained 
9 Oats 20 100 100 100 100 Soil- 2481 99.97 99.84 99.89 100 
vinyard- 
develop 
10 Soybeans- 972 90.23, 98.26 97.20 87.19 Corm- 1311 98.24 98.78 99.70 99.08 
notill sensed- 
green- 
weeds 
11 Soybean- 2455 97.80 98.77 99.04 91.01  Lettuce- 427 99.64 99.06 99.37 100 
mintill romaine- 
4wk 
12 Soybean- 593 98.10 97.15 95.45 94.84  Lettuce- 771 100 99.14 98.09 100 
clean romaine- 
5k 
13 Wheat 205 88.96 96.72 100 100 Lettuce- 366 100 90.55 96.00 99.64 
romaine- 
6k 
14 Wood 1265 99.02 9946 98.94 96.81  Lettuce- 428 98.94 93.46 96.89 98.75 
romaine- 
Tk 
15 Buildings- 386 96.08 93.80 94.73 81.57 Vinyard- 2907 95.23 9747 99.22 82.61 
Grass-Trees- untrained 
Drives 
16 Stone-steel- 93 97.03 100 100 100 Vinyard- 723 99.58 99.06 99.41 99.82 
Towers vertical- 
trellis 
10249 54129 
AA 97.53 97.31 96.37 85.23 AA 99.85 98.65 98.90 97.37 
OA 98.29 98.92 97.08 86.55 OA 99.67 99.08 98.96 94.95 


Table 2. Classification accuracy of different methods 
Methods Indian pines dataset Salinas dataset 

OA AA Kappa OA AA ___ Kappa 

SVM 85.30 79.03 83.10 92.95 94.60 92.11 

2D-CNN 89.48 86.14 87.96 97.38) 98.84 =—-97.08 

3D-CNN 91.10 «91.58 = 89.98 93.96 97.01 93.32 

H3D-CNN _98.30 96.53 98.06 99.67 99.85 «99.64 


The training and validations of the datasets were analyzed at a loss, and accuracy compared with 50 
to 100 epochs of the datasets in the fully connected layersr1 and layer 2. We used 256 units for batch size for 
an activation dropout rate of 0.4%. For the Indian pines and Salinas datasets, all the analyses are in Table 2 
display the findings for several techniques in terms of AA, OA, and kappa coefficient. The performance of the 
multiple databases, and the spectral and spatial information of the 3D-CNN and the 2D-CNN, are comparable. 

The Indian pines and Salinas classification map of the hyperspectral image are shown in Figure 3 and 
Figure 4. We used SVM, 2D-CNN, 3D-CNN, [30], and H3D-CNN methods. The quality of the H3D-CNN 
classified image is better than the other models. The spectral accuracy of class label information of different 
datasets shown in Figure 5. 
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The confusion matrix of the Salinas classified hyperspectral image is shown in Figure 6. The accuracy 
and loss convergence of 50 epochs of training and validation are shown in Figure 7. The computation efficiency 
of the H3D-CNN in terms of the training and testing with the window size 25x25 is the best outcome of spatial 
dimensions compact to model with the 10% of the samples used for the summarization for the best results. 

We have trained our model using Keras, Scikit, and Tensorflow, and it is trained on a single AMD 
Radeon 1.60 GHz 4GB GPU and Google Colab. We have compared our results with SVM, 2D-CNN, 3D- 
CNN, and H3D-CNN. Table 1 summarises the accuracy and performance of the Indian Pines (IP), Pavia 
University (PU), and Salinas (SA) datasets. The grades of AA and OA used for altered approaches precision 
adjacent to an evaluation of instruction and test outcomes are 98.53 %(AA), 98.29 %(OA) for Indian Pines, 
and 99.67% (AA), 99.85% (OA) for Salinas data with 50 epochs for data training. We have illustrated this in 
Table.2, the spatial performance of the H3D-CNN model with 19 x 19 spatial dimensions are used for the 
proposed model, and computed the results with training data only10% of the total samples. Where the proposed 
model still outperforms other methods. 
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Figure 3. The Indian Pines classification SVM, 2D-CNN, 3D-CNN, H3D-CNN predicted map with labels 
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Figur 4. Salinas classification SVM, 2D-CNN, 3D-CNN, H3D-CNN predicted map with labels 


Indonesian J Elec Eng & Comp Sci, Vol. 29, No. 1, January 2023: 295-303 


Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 im) 301 


130 
120 GE Indian_pines 
410 ( Salinas DS 
100 
90 
80 
70 
60 
50 
40 
30 


Accuracy(%) 


012 3 45 6 7 8 9 10 11 12 13 14 15 16 
Class Labels 


= 


7 


Figure 5. Indian Pines, Salinas accuracy 
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Figure 7. Salinas datasets training and validation loss and accuracy 
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4. CONCLUSION 

This paper proposes a HIS classification architecture with a reduced-dimensional Hybrid 3D-CNN 
model, demonstrating the overall performance for data training and research. The H3D-CNN framework 
suggested the exclusive use of classified spatial and spectral knowledge for HSI analysis. In future studies, we 
intend to explore theoretically more powerful HSI classification approaches based on H3D-CNN that can be 
used for unlabelled samples. Untreated samples are much simpler to access in HSI than labeled samples. To 
allow better use of such unmarked samples, including in the organized categorization process relating to 3D- 
CNN files, the 3D convolution spectral classification method based on 3D-CNN wishes to be enhanced. 
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